Using visual and semantic features for anti-spam filters

نویسندگان

  • F. Gargiulo
  • A. Picariello
  • C. Sansone
چکیده

It is well known that Unsolicited Commercial Emails (UCE), commonly known as spam, are becoming a serious problem for email accounts of single users, small companies and large institutions. The presence of spam can seriously compromise normal user activities, forcing to navigate through mailboxes to find the relatively few interesting emails, so wasting time and bandwidth, occupying their storage space. Eventually, they have often unsuitable content (as a pornographic material advertising) that could be illegal for the minors. In this realm, different countermeasures to spam have been proposed, using regulatory or technical approach. The legislative approach doesn’t obtain the desired results. A variety of technical approach are thus implemented in different anti-spam filters currently used to detect the spam content [8]. In the past, researchers have addressed this problem as a text classification or categorization problem. However, as spammers’ techniques continue to evolve and the genre of email content becomes more and more diverse, keywords-based anti-spam approaches alone are no longer sufficient. Different techniques are used to analyze the mail text, the majority are learningbased approach. Considering the spam detection as a binary classification problem, several algorithms from learning theory field can be used, such as bayesian algorithms [5] or Support Vector Machine (SVM) [6]. These systems, using the acquired knowledge, are able to discriminate the synthetic features in order to reject the mail considered as spam. Note these approaches generally don’t take into account the semantic content of e-mails. In this paper, we propose a novel anti-spam system which utilizes visual clues, in addition to semantic analysis information in the email body, to determine whether a message is spam. Figure 1: System Architecture As shown in figure 1, we propose a system that integrates image and text semantic analysis using a set of hierarchical decision systems: in particular, the different modules of the algorithm are activated only when the previous phases are not sufficient to give a positive answer. We thus apply to the text contained into e-mails some semantic analysis and Natural Language Processing (NLP) based techniques, i.e.: (i)) Latent Dirichlet Analysis (LDA)[7], in order to define a distribution of keywords for semantically categorizing e-mails contents; (ii) Latent

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Effective Model for SMS Spam Detection Using Content-based Features and Averaged Neural Network

In recent years, there has been considerable interest among people to use short message service (SMS) as one of the essential and straightforward communications services on mobile devices. The increased popularity of this service also increased the number of mobile devices attacks such as SMS spam messages. SMS spam messages constitute a real problem to mobile subscribers; this worries telecomm...

متن کامل

Spam Filtering Based On The Analysis Of Text Information Embedded Into Images

In recent years anti-spam filters have become necessary tools for Internet service providers to face up to the continuously growing spam phenomenon. Current server-side anti-spam filters are made up of several modules aimed at detecting different features of spam e-mails. In particular, text categorisation techniques have been investigated by researchers for the design of modules for the analys...

متن کامل

A New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection

Emails are one of the fastest economic communications. Increasing email users has caused the increase of spam in recent years. As we know, spam not only damages user’s profits, time-consuming and bandwidth, but also has become as a risk to efficiency, reliability, and security of a network. Spam developers are always trying to find ways to escape the existing filters therefore new filters to de...

متن کامل

How Anti-Spam Measures Impact on Your Email

Unsolicited commercial email, commonly referred to as spam, is now recognized as a major problem causing considerable costs (e.g., [Ferris Research 03]) and impacting on how people use email (e.g., [Fallows 03]). Anti-spam measures, such as email filters and server-based block lists, have become ubiquitous. There is a growing body of empirical and anecdotal evidence suggesting that apart from t...

متن کامل

Image spam filtering using textual and visual information

In this paper we focus on the so-called image spam, which consists in embedding the spam message into images attached to e-mails to circumvent statistical techniques based on the analysis of body text of e-mails (like the “bayesian filters”), and in applying content obscuring techniques to such images to make them unreadable by standard OCR systems without compromising human readability. We arg...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007